Frequent subgraph mining (FSM) is an important task for exploratory dataanalysis on graph data. Over the years, many algorithms have been proposed tosolve this task. These algorithms assume that the data structure of the miningtask is small enough to fit in the main memory of a computer. However, as thereal-world graph data grows, both in size and quantity, such an assumption doesnot hold any longer. To overcome this, some graph database-centric methods havebeen proposed in recent years for solving FSM; however, a distributed solutionusing MapReduce paradigm has not been explored extensively. Since, MapReduce isbecoming the de- facto paradigm for computation on massive data, an efficientFSM algorithm on this paradigm is of huge demand. In this work, we propose afrequent subgraph mining algorithm called MIRAGE which uses an iterativeMapReduce based framework. MIRAGE is complete as it returns all the frequentsubgraphs for a given user-defined support, and it is efficient as it appliesall the optimizations that the latest FSM algorithms adopt. Our experimentswith real life and large synthetic datasets validate the effectiveness ofMIRAGE for mining frequent subgraphs from large graph datasets. The source codeof MIRAGE is available from www.cs.iupui.edu/alhasan/software/
展开▼